Principal component analysis for interval-valued observations
نویسندگان
چکیده
One feature of contemporary datasets is that instead of the single point value in the p-dimensional space R seen in classical data, the data may take interval values thus producing hypercubes in R . This paper studies the vertices principal components methodology for interval-valued data; and provides enhancements to allow for so-called ‘trivial’ intervals, and generalized weight functions. It also introduces the concept of vertex contributions to the underlying principal components, a concept not possible for classical data, but one which provides a visualization method that further aids in the interpretation of the methodology. The method is illustrated in a dataset using measurements of facial characteristics obtained from a study of face recognition patterns for surveillance purposes. A comparison with analyses in which classical surrogates replace the intervals, shows how the symbolic analysis gives more informative conclusions. A second example illustrates how the method can be applied even when the number of parameters exceeds the number of observations, as well as how uncertainty data can be accommodated. 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 229–246, 2011
منابع مشابه
Symbolic Principal Components for Interval-valued Observations
One feature of contemporary datasets is that instead of the single point value in the p-dimensional space < seen in classical data, the data may take interval values thus producing hypercubes in <. This paper extends the methodology of classical principal components to that for interval-valued data. Two methods are proposed, viz., a vertices method which uses all the vertices of the observation...
متن کاملSymbolic Covariance Matrix for Interval-valued Variables and its Application to Principal Component Analysis: a Case Study
In the last two decades, principal component analysis (PCA) was extended to interval-valued data; several adaptations of the classical approach are known from the literature. Our approach is based on the symbolic covariance matrix Cov for the interval-valued variables proposed by Billard (2008). Its crucial advantage, when compared to other approaches, is that it fully utilizes all the informat...
متن کاملPrincipal Curves and Surfaces to Interval Valued Variables
In this paper we propose a generalization to symbolic interval valued variables of the Principal Curves and Surfaces method proposed by T. Hastie in [4]. Given a data set X with n observations and m continuos variables the main idea of Principal Curves and Surfaces method is to generalize the principal component line, providing a smooth one-dimensional curved approximation to a set of data poin...
متن کاملExtracting Information from Interval Data Using Symbolic Principal Component Analysis
We address the definition of symbolic variance and covariance for random interval-valued variables, and present four known symbolic principal component estimation methods using a common insightful framework. In addition, we provide a simple explicit formula for the scores of the symbolic principal components, equivalent to the representation by Maximum Covering Area Rectangle. Furthermore, the ...
متن کاملAn interval-valued intuitionistic fuzzy principal component analysis model-based method for complex multi-attribute large-group decision-making
The complex multi-attribute large-group decision-making problems that are based on interval-valued intuitionistic fuzzy information have become a common topic of research in the field of decision-making. Due to the complexity of this kind of problem, alternatives are usually described bymultiple attributes that exhibit a high degree of interdependence or interactivity. In addition, decision mak...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistical Analysis and Data Mining
دوره 4 شماره
صفحات -
تاریخ انتشار 2011